Consistent group selection in high-dimensional linear regression.

نویسندگان

  • Fengrong Wei
  • Jian Huang
چکیده

In regression problems where covariates can be naturally grouped, the group Lasso is an attractive method for variable selection since it respects the grouping structure in the data. We study the selection and estimation properties of the group Lasso in high-dimensional settings when the number of groups exceeds the sample size. We provide sufficient conditions under which the group Lasso selects a model whose dimension is comparable with the underlying model with high probability and is estimation consistent. However, the group Lasso is, in general, not selection consistent and also tends to select groups that are not important in the model. To improve the selection results, we propose an adaptive group Lasso method which is a generalization of the adaptive Lasso and requires an initial estimator. We show that the adaptive group Lasso is consistent in group selection under certain conditions if the group Lasso is used as the initial estimator.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

A Stepwise Regression Method and Consistent Model Selection for High-dimensional Sparse Linear Models by Ching-kang Ing

We introduce a fast stepwise regression method, called the orthogonal greedy algorithm (OGA), that selects input variables to enter a p-dimensional linear regression model (with p >> n, the sample size) sequentially so that the selected variable at each step minimizes the residual sum squares. We derive the convergence rate of OGA as m = mn becomes infinite, and also develop a consistent model ...

متن کامل

Some Two-Step Procedures for Variable Selection in High-Dimensional Linear Regression

We study the problem of high-dimensional variable selection via some two-step procedures. First we show that given some good initial estimator which is l∞-consistent but not necessarily variable selection consistent, we can apply the nonnegative Garrote, adaptive Lasso or hard-thresholding procedure to obtain a final estimator that is both estimation and variable selection consistent. Unlike th...

متن کامل

The group lasso for logistic regression

The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, w...

متن کامل

A Stepwise Regression Method and Consistent Model Selection for High-dimensional Sparse Linear Models

We introduce a fast stepwise regression method, called the orthogonal greedy algorithm (OGA), that selects input variables to enter a p-dimensional linear regression model (with p À n, the sample size) sequentially so that the selected variable at each step minimizes the residual sum squares. We derive the convergence rate of OGA and develop a consistent model selection procedure along the OGA ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability

دوره 16 4  شماره 

صفحات  -

تاریخ انتشار 2010